Machine Learning
MGMT 675
AI-Assisted Financial Analysis
Kerry Back

Machine learning in finance
- Fraud detection
- Credit risk analysis
- Return prediction
- Valuation
- Text analysis
- Time series forecasting
Models
- Linear
- Trees
- Neural networks
- Others
Regression vs Classification
- Regression means to predict a continuous variable (not necessarily linear regression).
- Classification is to predict a categorical variable. Binary or multiclass.
Train and Test
- Training means fitting a model (like linear regression).
- Objective is to make accurate predictions on new data.
- To assess performance, we have to check the model on “new data” (data not used in training).
- Split data into random train and test subsets. Train on training data. Test on test data.
Test criteria
- How do we decide if performance is good or bad?
- For continuous variables,
- usually want to achieve a low sum of squared errors
- equivalently, achieve a high \(R^2\).
\[R^2 = 1 - \frac{\sum (y_i - \hat y_i)^2}{\sum (y_i - \bar{y})^2}\]
- Categorical also based on prediction errors
Example data
- Download ml1.xlsx from the course website
- Upload it to Julius and ask Julius to read it and describe it.
- The data was created by generating 51 sets of 100 standard normals.
- The first 50 sets are labeled x1, …, x50.
- The last set was used as the noise to generate y1 as x1 + noise.
- So, x2, …, x50 are irrelevant for y1.
Linear regression example
- Ask Julius to do a train-test split of the data with 20% of the data in the test set.
- Ask Julius to train a linear regression on the training data with x1, …, x50 as the features and y1 as the target.
- Ask Julius to compute the R-squared on the test data.
- Ask Julius to report the parameter estimates.